
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <title>Use Scala in a Python Notebook &#8212; PixieDust Documentation</title>
    <link rel="stylesheet" href="_static/better.css" type="text/css" />
    <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
    <link rel="stylesheet" href="_static/custom.css" type="text/css" />
    <script type="text/javascript">
      var DOCUMENTATION_OPTIONS = {
        URL_ROOT:    './',
        VERSION:     '1.0',
        COLLAPSE_INDEX: false,
        FILE_SUFFIX: '.html',
        HAS_SOURCE:  true,
        SOURCELINK_SUFFIX: '.txt'
      };
    </script>
    <script type="text/javascript" src="_static/jquery.js"></script>
    <script type="text/javascript" src="_static/underscore.js"></script>
    <script type="text/javascript" src="_static/doctools.js"></script>
    <link rel="shortcut icon" href="_static/pd_icon.ico"/>
    <link rel="index" title="Index" href="genindex.html" />
    <link rel="search" title="Search" href="search.html" />
    <link rel="next" title="Spark Progress Monitor" href="sparkmonitor.html" />
    <link rel="prev" title="Package Manager" href="packagemanager.html" />
  <meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=1">
  </head>
  <body>
    <header id="pageheader"><h1><a href="index.html ">
        PixieDust Documentation
    </a></h1></header>  

    <div class="document">
      <div class="documentwrapper">
        <div class="bodywrapper">
          <div class="body" role="main">
            
  <div class="section" id="use-scala-in-a-python-notebook">
<h1>Use Scala in a Python Notebook<a class="headerlink" href="#use-scala-in-a-python-notebook" title="Permalink to this headline">¶</a></h1>
<div class="section" id="introduction">
<h2>Introduction<a class="headerlink" href="#introduction" title="Permalink to this headline">¶</a></h2>
<p>Python has a rich ecosystem of modules including plotting with Matplotlib, data structure and analysis with Pandas, Machine Learning or Natural Language Processing. However, data scientists working with Spark may occasionaly need to call out one of the hundreds of libraries available on <a class="reference external" href="https://spark-packages.org/">spark-packages.org</a> which are written in Scala or Java. Unfortunately, Jupyter Python notebooks do not currently provide a way to call out scala code. As a result, a typical workaround is to first use a Scala notebook to run the Scala code, persist the output somewhere like a Hadoop Distributed File System, create another Python notebook, and re-load the data. This is obviously inefficent and awkward.</p>
<p>PixieDust provides a solution to this problem by letting users directly write and run scala code in its own cell. It also lets variables be shared between Python and Scala and vice-versa.</p>
</div>
<div class="section" id="using-scala-cell-magic">
<h2>Using Scala cell magic<a class="headerlink" href="#using-scala-cell-magic" title="Permalink to this headline">¶</a></h2>
<p>After importing the PixieDust module, users can simply use the %%scala magic to write Scala code and run the cell as any other normal cell. Any compilation error will be reported in the cell output.
The following example shows how to run the <a class="reference external" href="https://developer.ibm.com/clouddataservices/sentiment-analysis-of-twitter-hashtags/">Watson sentiment sample app</a> written in Scala:</p>
<p>First install the jar into the Python kernel:</p>
<blockquote>
<div><div class="highlight-default"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">pixiedust</span>
<span class="n">pixiedust</span><span class="o">.</span><span class="n">installPackage</span><span class="p">(</span><span class="s2">&quot;https://github.com/ibm-cds-labs/spark.samples/raw/master/dist/streaming-twitter-assembly-1.6.jar&quot;</span><span class="p">)</span>
</pre></div>
</div>
</div></blockquote>
<p>You can now run the Scala code that uses Spark Streaming to fetch tweets:</p>
<blockquote>
<div><div class="highlight-default"><div class="highlight"><pre><span></span><span class="o">%%</span><span class="n">scala</span>
<span class="n">val</span> <span class="n">demo</span> <span class="o">=</span> <span class="n">com</span><span class="o">.</span><span class="n">ibm</span><span class="o">.</span><span class="n">cds</span><span class="o">.</span><span class="n">spark</span><span class="o">.</span><span class="n">samples</span><span class="o">.</span><span class="n">StreamingTwitter</span>
<span class="n">demo</span><span class="o">.</span><span class="n">setConfig</span><span class="p">(</span><span class="s2">&quot;twitter4j.oauth.consumerKey&quot;</span><span class="p">,</span><span class="s2">&quot;XXXX&quot;</span><span class="p">)</span>
<span class="n">demo</span><span class="o">.</span><span class="n">setConfig</span><span class="p">(</span><span class="s2">&quot;twitter4j.oauth.consumerSecret&quot;</span><span class="p">,</span><span class="s2">&quot;XXXX&quot;</span><span class="p">)</span>
<span class="n">demo</span><span class="o">.</span><span class="n">setConfig</span><span class="p">(</span><span class="s2">&quot;twitter4j.oauth.accessToken&quot;</span><span class="p">,</span><span class="s2">&quot;XXXX&quot;</span><span class="p">)</span>
<span class="n">demo</span><span class="o">.</span><span class="n">setConfig</span><span class="p">(</span><span class="s2">&quot;twitter4j.oauth.accessTokenSecret&quot;</span><span class="p">,</span><span class="s2">&quot;XXXX&quot;</span><span class="p">)</span>
<span class="n">demo</span><span class="o">.</span><span class="n">setConfig</span><span class="p">(</span><span class="s2">&quot;watson.tone.url&quot;</span><span class="p">,</span><span class="s2">&quot;https://gateway.watsonplatform.net/tone-analyzer/api&quot;</span><span class="p">)</span>
<span class="n">demo</span><span class="o">.</span><span class="n">setConfig</span><span class="p">(</span><span class="s2">&quot;watson.tone.password&quot;</span><span class="p">,</span><span class="s2">&quot;XXXX&quot;</span><span class="p">)</span>
<span class="n">demo</span><span class="o">.</span><span class="n">setConfig</span><span class="p">(</span><span class="s2">&quot;watson.tone.username&quot;</span><span class="p">,</span><span class="s2">&quot;XXXX&quot;</span><span class="p">)</span>

<span class="kn">import</span> <span class="nn">org.apache.spark.streaming._</span>
<span class="n">demo</span><span class="o">.</span><span class="n">startTwitterStreaming</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="n">Seconds</span><span class="p">(</span><span class="mi">30</span><span class="p">))</span>
</pre></div>
</div>
</div></blockquote>
</div>
<div class="section" id="variable-transfer">
<h2>Variable transfer<a class="headerlink" href="#variable-transfer" title="Permalink to this headline">¶</a></h2>
<p>Every variable defined within Python are accessible in Scala.
For example:</p>
<blockquote>
<div><div class="highlight-default"><div class="highlight"><pre><span></span><span class="c1">#define variables in python</span>
<span class="n">var1</span><span class="o">=</span><span class="s2">&quot;Hello&quot;</span>
<span class="n">var2</span><span class="o">=</span><span class="mi">200</span>
</pre></div>
</div>
</div></blockquote>
<p>You can then access these variables in Scala</p>
<blockquote>
<div><div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">println</span><span class="p">(</span><span class="n">var1</span><span class="p">)</span>
<span class="n">println</span><span class="p">(</span><span class="n">var2</span> <span class="o">+</span> <span class="mi">10</span><span class="p">)</span>
</pre></div>
</div>
</div></blockquote>
<p>Likewise, you can transfer variables defined in Scala by prefixing them with __ (2 underscores). The following example shows how to access the tweets into a dataframe and transfer them into the python shell namespace:</p>
<blockquote>
<div><div class="highlight-default"><div class="highlight"><pre><span></span><span class="o">%%</span><span class="n">scala</span>
<span class="n">val</span> <span class="n">demo</span> <span class="o">=</span> <span class="n">com</span><span class="o">.</span><span class="n">ibm</span><span class="o">.</span><span class="n">cds</span><span class="o">.</span><span class="n">spark</span><span class="o">.</span><span class="n">samples</span><span class="o">.</span><span class="n">StreamingTwitter</span>
<span class="n">val</span> <span class="p">(</span><span class="n">__sqlContext</span><span class="p">,</span> <span class="n">__df</span><span class="p">)</span> <span class="o">=</span> <span class="n">demo</span><span class="o">.</span><span class="n">createTwitterDataFrames</span><span class="p">(</span><span class="n">sc</span><span class="p">)</span>
</pre></div>
</div>
</div></blockquote>
<p>Then use the __df variable in Python</p>
<blockquote>
<div><div class="highlight-default"><div class="highlight"><pre><span></span><span class="nb">print</span><span class="p">(</span><span class="n">__df</span><span class="o">.</span><span class="n">count</span><span class="p">())</span>
</pre></div>
</div>
</div></blockquote>
</div>
</div>


          </div>
        </div>
      </div>
      <div class="sphinxsidebar" role="navigation" aria-label="main navigation">
        <div class="sphinxsidebarwrapper">
<h3><a href="index.html">Table Of Contents</a></h3>
<ul class="current">
<li class="toctree-l1 current"><a class="reference internal" href="use.html">Use PixieDust</a><ul class="current">
<li class="toctree-l2"><a class="reference internal" href="install.html">Install PixieDust</a></li>
<li class="toctree-l2"><a class="reference internal" href="loaddata.html">Load Data</a></li>
<li class="toctree-l2"><a class="reference internal" href="displayapi.html">Display Data</a></li>
<li class="toctree-l2"><a class="reference internal" href="packagemanager.html">Package Manager</a></li>
<li class="toctree-l2 current"><a class="current reference internal" href="#">Use Scala in a Python Notebook</a></li>
<li class="toctree-l2"><a class="reference internal" href="sparkmonitor.html">Spark Progress Monitor</a></li>
<li class="toctree-l2"><a class="reference internal" href="download.html">Download Data</a></li>
<li class="toctree-l2"><a class="reference internal" href="logging.html">Logging</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="develop.html">Develop for PixieDust</a></li>
<li class="toctree-l1"><a class="reference internal" href="pixieapps.html">PixieApps</a></li>
<li class="toctree-l1"><a class="reference internal" href="pixiegateway.html">PixieGateway</a></li>
<li class="toctree-l1"><a class="reference internal" href="releasenotes.html">Release Notes</a></li>
</ul>

<div id="searchbox" style="display: none" role="search">
  <h3>Quick search</h3>
    <form class="search" action="search.html" method="get">
      <div><input type="text" name="q" /></div>
      <div><input type="submit" value="Go" /></div>
      <input type="hidden" name="check_keywords" value="yes" />
      <input type="hidden" name="area" value="default" />
    </form>
</div>
<script type="text/javascript">$('#searchbox').show(0);</script>
        </div>
      </div>
      <div class="clearer"></div>
    </div>
  <footer id="pagefooter">&copy; 2017, IBM.
      Created using <a href="http://sphinx-doc.org/">Sphinx</a>
      1.6.3.

  </footer>

  
  </body>
</html>